场景理解是一个活跃的研究区域。商业深度传感器(如Kinect)在过去几年中启用了几个RGB-D数据集的发布,它在3D场景理解中产生了新的方法。最近,在Apple的iPad和iPhone中推出LIDAR传感器,可以在他们通常使用的设备上访问高质量的RGB-D数据。这在对计算机视觉社区以及应用程序开发人员来说,这是一个全新的时代。现场理解的基本研究与机器学习的进步一起可以影响人们的日常经历。然而,将这些现场改变为现实世界经验的理解方法需要额外的创新和发展。在本文中,我们介绍了Arkitscenes。它不仅是具有现在广泛可用深度传感器的第一个RGB-D数据集,而且是我们最好的知识,它也是了解数据发布的最大的室内场景。除了来自移动设备的原始和处理的数据之外,Arkitscenes还包括使用固定激光扫描仪捕获的高分辨率深度图,以及手动标记为家具的大型分类的3D定向边界盒。我们进一步分析了两个下游任务数据的有用性:3D对象检测和色彩引导深度上采样。我们展示了我们的数据集可以帮助推动现有最先进的方法的边界,并引入了更好代表真实情景的新挑战。
translated by 谷歌翻译
In intensively managed forests in Europe, where forests are divided into stands of small size and may show heterogeneity within stands, a high spatial resolution (10 - 20 meters) is arguably needed to capture the differences in canopy height. In this work, we developed a deep learning model based on multi-stream remote sensing measurements to create a high-resolution canopy height map over the "Landes de Gascogne" forest in France, a large maritime pine plantation of 13,000 km$^2$ with flat terrain and intensive management. This area is characterized by even-aged and mono-specific stands, of a typical length of a few hundred meters, harvested every 35 to 50 years. Our deep learning U-Net model uses multi-band images from Sentinel-1 and Sentinel-2 with composite time averages as input to predict tree height derived from GEDI waveforms. The evaluation is performed with external validation data from forest inventory plots and a stereo 3D reconstruction model based on Skysat imagery available at specific locations. We trained seven different U-net models based on a combination of Sentinel-1 and Sentinel-2 bands to evaluate the importance of each instrument in the dominant height retrieval. The model outputs allow us to generate a 10 m resolution canopy height map of the whole "Landes de Gascogne" forest area for 2020 with a mean absolute error of 2.02 m on the Test dataset. The best predictions were obtained using all available satellite layers from Sentinel-1 and Sentinel-2 but using only one satellite source also provided good predictions. For all validation datasets in coniferous forests, our model showed better metrics than previous canopy height models available in the same region.
translated by 谷歌翻译
A core process in human cognition is analogical mapping: the ability to identify a similar relational structure between different situations. We introduce a novel task, Visual Analogies of Situation Recognition, adapting the classical word-analogy task into the visual domain. Given a triplet of images, the task is to select an image candidate B' that completes the analogy (A to A' is like B to what?). Unlike previous work on visual analogy that focused on simple image transformations, we tackle complex analogies requiring understanding of scenes. We leverage situation recognition annotations and the CLIP model to generate a large set of 500k candidate analogies. Crowdsourced annotations for a sample of the data indicate that humans agree with the dataset label ~80% of the time (chance level 25%). Furthermore, we use human annotations to create a gold-standard dataset of 3,820 validated analogies. Our experiments demonstrate that state-of-the-art models do well when distractors are chosen randomly (~86%), but struggle with carefully chosen distractors (~53%, compared to 90% human accuracy). We hope our dataset will encourage the development of new analogy-making models. Website: https://vasr-dataset.github.io/
translated by 谷歌翻译
Semi-supervised anomaly detection is a common problem, as often the datasets containing anomalies are partially labeled. We propose a canonical framework: Semi-supervised Pseudo-labeler Anomaly Detection with Ensembling (SPADE) that isn't limited by the assumption that labeled and unlabeled data come from the same distribution. Indeed, the assumption is often violated in many applications - for example, the labeled data may contain only anomalies unlike unlabeled data, or unlabeled data may contain different types of anomalies, or labeled data may contain only 'easy-to-label' samples. SPADE utilizes an ensemble of one class classifiers as the pseudo-labeler to improve the robustness of pseudo-labeling with distribution mismatch. Partial matching is proposed to automatically select the critical hyper-parameters for pseudo-labeling without validation data, which is crucial with limited labeled data. SPADE shows state-of-the-art semi-supervised anomaly detection performance across a wide range of scenarios with distribution mismatch in both tabular and image domains. In some common real-world settings such as model facing new types of unlabeled anomalies, SPADE outperforms the state-of-the-art alternatives by 5% AUC in average.
translated by 谷歌翻译
The attention mechanism is considered the backbone of the widely-used Transformer architecture. It contextualizes the input by computing input-specific attention matrices. We find that this mechanism, while powerful and elegant, is not as important as typically thought for pretrained language models. We introduce PAPA, a new probing method that replaces the input-dependent attention matrices with constant ones -- the average attention weights over multiple inputs. We use PAPA to analyze several established pretrained Transformers on six downstream tasks. We find that without any input-dependent attention, all models achieve competitive performance -- an average relative drop of only 8% from the probing baseline. Further, little or no performance drop is observed when replacing half of the input-dependent attention matrices with constant (input-independent) ones. Interestingly, we show that better-performing models lose more from applying our method than weaker models, suggesting that the utilization of the input-dependent attention mechanism might be a factor in their success. Our results motivate research on simpler alternatives to input-dependent attention, as well as on methods for better utilization of this mechanism in the Transformer architecture.
translated by 谷歌翻译
超参数优化是识别给定的机器学习模型的适当的超参数配置的过程。对于较小的数据集,可以进行详尽的搜索;但是,当数据大小和模型复杂性增加时,配置评估的数量成为主要计算瓶颈。解决此类问题的有希望的范式是基于替代物的优化。此范式基础的主要思想考虑了超参数空间与输出(目标)空间之间关系的增量更新模型;该模型的数据是通过评估主学习引擎来获得的,例如基于计算机的模型。通过学习近似超参数目标关系,可以使用替代(机器学习)模型来评分大量的超参数配置,并探索除直接机器学习引擎评估的配置空间的一部分。通常,在优化初始化之前选择替代物,并且在搜索过程中保持不变。我们调查了在优化本身期间代孕物质的动态切换是否是选择最合适的基于计算机的大规模在线推荐的最合适的分解模型的实用相关性的明智概念。我们对包含数亿个实例的数据集进行了基准测试,以针对既定基线,例如随机森林和高斯基于过程的替代物。结果表明,替代转换可以提供良好的性能,同时考虑学习引擎评估较少。
translated by 谷歌翻译
概念诱导是基于正式的逻辑推理在描述逻辑上的,已在本体工程中使用,以从基本数据(ABOX)图创建本体(Tbox)公理。在本文中,我们表明它也可以用来解释数据差异,例如在可解释的AI(XAI)的背景下,我们表明它实际上可以以对人类观察者有意义的方式进行。我们的方法利用了从Wikipedia类别层次结构策划的大型层次结构,作为背景知识。
translated by 谷歌翻译
自我监督的学习允许AI系统使用不需要昂贵的标签的任务从大量数据中学习有效表示。模式崩溃,即为所有输入产生相同表示形式的模型,是许多自我监督学习方法的核心问题,可以使自我监督任务(例如匹配输入的变形变体)无效。在本文中,我们认为,同一输入的替代潜在表示之间信息最大化的直接应用自然解决了崩溃问题并实现了竞争性的经验结果。我们提出了一种自我监督的学习方法Corinfomax,该方法使用了基于二阶统计的共同信息度量,以反映其参数之间的相关性水平。在同一输入的替代表示之间最大化此相关信息度量有两个目的:(1)它通过生成具有非脱位协方差的特征向量来避免崩溃问题; (2)通过增加它们之间的线性依赖性,它在替代表示之间建立了相关性。提出的信息最大化客观的近似简化为基于欧几里得距离的目标函数,该目标函数由特征协方差矩阵的对数确定因素正规化。正则术语是针对特征空间退化的自然障碍。因此,除了避免完全输出崩溃到一个点外,提出的方法还通过鼓励信息在整个特征空间中的传播来防止尺寸崩溃。数值实验表明,相对于最先进的SSL方法,Corinfomax取得更好或竞争性的性能结果。
translated by 谷歌翻译
基础模型(FMS)已证明了前所未有的功能,包括零拍学习,高保真数据合成和范围内的概括。但是,正如我们在本文中所显示的那样,FMS在专家任务上的开箱即用表现较差(例如,从语言查询中检索汽车手册技术插图),数据是看不见的,或者属于长尾的数据用于FM预训练的大型数据集的数据分布的一部分。这强调了在此类专家任务上明确评估和芬太尼FMS的必要性,这可以说是在实际现实世界中最重要的任务。在本文中,我们提出了围绕教授FMS了解技术文档的任务,通过学习将其图形插图与相应的语言描述相匹配的任务围绕着了解技术文档的任务。我们的FETA基准重点是公共汽车手册和销售目录手册中的文本对图像和图像到文本检索。 FETA配备了完全自动注释提取的程序(接受后将发布代码),从而使Feta轻松扩展到将来更多的文档类型和应用域。我们的自动注释导致自动性能指标显示,该指标与在人类策划注释中计算的指标一致(也发布)。我们提供多个基线和对FETA的流行FM的分析,从而导致一些有趣的发现,我们认为这对FM社区非常有价值,为现实世界中FMS应用于当前被标准基准的“忽视”的实践专家任务铺平了道路。在常见对象上。
translated by 谷歌翻译
本文提出了一种新型的逆运动学(IK)索引机器人系统的求解器,用于路径计划。IK是机器人操纵的传统但必不可少的问题。最近,已经提出了数据驱动的方法来快速解决IK进行路径计划。这些方法可以通过GPU的优势立即处理大量的IK请求。但是,准确性仍然很低,并且该模型需要大量的培训时间。因此,我们提出了一个IK求解器,该求解器通过利用神经ODE的连续隐藏动力学来提高准确性和记忆效率。使用多个机器人比较性能。
translated by 谷歌翻译